{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Forecasting Simple Time-Series"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example shows how to estimate time-series that are relatively constant or have a defined trend, but other than that do not have any patterns over time. If there are other patterns in the data, such as seasonality, see Forecasting Complex Time-Series."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load in the Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll load in the data with `pandas`, which should be review."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"df = pd.read_excel('Sales COGS.xlsx')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Unnamed: 0
\n",
"
2017-12-31 00:00:00
\n",
"
2018-12-31 00:00:00
\n",
"
2019-12-31 00:00:00
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Sales
\n",
"
1000
\n",
"
1200
\n",
"
1100
\n",
"
\n",
"
\n",
"
1
\n",
"
Cost of Goods Sold
\n",
"
620
\n",
"
700
\n",
"
650
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 2017-12-31 00:00:00 2018-12-31 00:00:00 \\\n",
"0 Sales 1000 1200 \n",
"1 Cost of Goods Sold 620 700 \n",
"\n",
" 2019-12-31 00:00:00 \n",
"0 1100 \n",
"1 650 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we should use Sales, Cost of Goods sold as the index. Load in by setting the index column."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
2017-12-31
\n",
"
2018-12-31
\n",
"
2019-12-31
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Sales
\n",
"
1000
\n",
"
1200
\n",
"
1100
\n",
"
\n",
"
\n",
"
Cost of Goods Sold
\n",
"
620
\n",
"
700
\n",
"
650
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 2017-12-31 2018-12-31 2019-12-31\n",
"Sales 1000 1200 1100\n",
"Cost of Goods Sold 620 700 650"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_excel('Sales COGS.xlsx', index_col=0)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that looks better."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For an effective plot, we will need to transpose the data, so that the dates are the index (x-axis on plot), and the data types are columns (series on plot). Thankfully this is as simple as `df.T`."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"\n",
"df.T.plot.line()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecast Using Most Recent Value\n",
"\n",
"This is the simplest forecast, just keep it the same as it was. \n",
"\n",
"Right now we have the dates as the columns. So access the columns and take the max to find the latest date."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2019-12-31 00:00:00')"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"last_date = df.columns.max()\n",
"last_date"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now select the values which have the latest date as the forecast"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $1,100 and for COGS is $650\n"
]
}
],
"source": [
"fcst_sales = df.loc['Sales'][last_date]\n",
"fcst_cogs = df.loc['Cost of Goods Sold'][last_date]\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecast Using Average\n",
"\n",
"We have already seen how to take averages of `pandas` `Series`:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $1,100 and for COGS is $657\n"
]
}
],
"source": [
"fcst_sales = df.loc['Sales'].mean()\n",
"fcst_cogs = df.loc['Cost of Goods Sold'].mean()\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecast Using Trend\n",
"\n",
"There are two methods to forecast using the trend."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Trend Method 1: By Regression\n",
"\n",
"We will estimate the following regression model:\n",
"$$y_t = a + \\beta t + \\epsilon_t$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create DataFrame with $t$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we need to create a `DataFrame` which has a column for the $y$ and a column for the $t$:.\n",
"\n",
"To do this, first we can create a `DataFrame` from the `Series` we want to forecast."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Sales
\n",
"
\n",
" \n",
" \n",
"
\n",
"
2017-12-31
\n",
"
1000
\n",
"
\n",
"
\n",
"
2018-12-31
\n",
"
1200
\n",
"
\n",
"
\n",
"
2019-12-31
\n",
"
1100
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Sales\n",
"2017-12-31 1000\n",
"2018-12-31 1200\n",
"2019-12-31 1100"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for_fcst_df = pd.DataFrame(df.loc['Sales'])\n",
"for_fcst_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use `reset_index(drop=True)` to get rid of the date index."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Sales
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1000
\n",
"
\n",
"
\n",
"
1
\n",
"
1200
\n",
"
\n",
"
\n",
"
2
\n",
"
1100
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Sales\n",
"0 1000\n",
"1 1200\n",
"2 1100"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for_fcst_df = for_fcst_df.reset_index(drop=True)\n",
"for_fcst_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now use `reset_index()` without the `drop=True` to get this new 0, 1, 2 index as a column."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" t Sales\n",
"0 0 1000\n",
"1 1 1200\n",
"2 2 1100"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for_fcst_df = for_fcst_df.rename(columns={'index': 't'})\n",
"for_fcst_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's wrap this all up in a function as we'll need to use this for COGS as well, and for the CAGR approaches."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
t
\n",
"
Cost of Goods Sold
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
0
\n",
"
620
\n",
"
\n",
"
\n",
"
1
\n",
"
1
\n",
"
700
\n",
"
\n",
"
\n",
"
2
\n",
"
2
\n",
"
650
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" t Cost of Goods Sold\n",
"0 0 620\n",
"1 1 700\n",
"2 2 650"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def for_forecast_df_from_orig_df(orig_df, series_name):\n",
" \"\"\"\n",
" From a DataFrame where index is name of series to be forecasted and columns are time periods, create a \n",
" DataFrame with two columns, t in periods and the value to be forecasted, and rows are time periods.\n",
" \"\"\"\n",
" for_fcst_df = pd.DataFrame(orig_df.loc[series_name])\n",
" for_fcst_df = for_fcst_df.reset_index(drop=True).reset_index()\n",
" for_fcst_df = for_fcst_df.rename(columns={'index': 't'})\n",
" return for_fcst_df\n",
"\n",
"for_forecast_df_from_orig_df(df, 'Cost of Goods Sold')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Now Run Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This should be review from the cost of equity exercise."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\users\\admin\\.virtualenvs\\fin-model-course-gvj3lsuv\\lib\\site-packages\\statsmodels\\stats\\stattools.py:71: ValueWarning: omni_normtest is not valid with less than 8 observations; 3 samples were given.\n",
" \"samples were given.\" % int(n), ValueWarning)\n"
]
},
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
Sales
R-squared:
0.250
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
-0.500
\n",
"
\n",
"
\n",
"
Method:
Least Squares
F-statistic:
0.3333
\n",
"
\n",
"
\n",
"
Date:
Tue, 19 Nov 2019
Prob (F-statistic):
0.667
\n",
"
\n",
"
\n",
"
Time:
12:47:39
Log-Likelihood:
-17.033
\n",
"
\n",
"
\n",
"
No. Observations:
3
AIC:
38.07
\n",
"
\n",
"
\n",
"
Df Residuals:
1
BIC:
36.26
\n",
"
\n",
"
\n",
"
Df Model:
1
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
const
1050.0000
111.803
9.391
0.068
-370.597
2470.597
\n",
"
\n",
"
\n",
"
t
50.0000
86.603
0.577
0.667
-1050.390
1150.390
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
Omnibus:
nan
Durbin-Watson:
3.000
\n",
"
\n",
"
\n",
"
Prob(Omnibus):
nan
Jarque-Bera (JB):
0.531
\n",
"
\n",
"
\n",
"
Skew:
0.707
Prob(JB):
0.767
\n",
"
\n",
"
\n",
"
Kurtosis:
1.500
Cond. No.
2.92
\n",
"
\n",
"
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: Sales R-squared: 0.250\n",
"Model: OLS Adj. R-squared: -0.500\n",
"Method: Least Squares F-statistic: 0.3333\n",
"Date: Tue, 19 Nov 2019 Prob (F-statistic): 0.667\n",
"Time: 12:47:39 Log-Likelihood: -17.033\n",
"No. Observations: 3 AIC: 38.07\n",
"Df Residuals: 1 BIC: 36.26\n",
"Df Model: 1 \n",
"Covariance Type: nonrobust \n",
"==============================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------\n",
"const 1050.0000 111.803 9.391 0.068 -370.597 2470.597\n",
"t 50.0000 86.603 0.577 0.667 -1050.390 1150.390\n",
"==============================================================================\n",
"Omnibus: nan Durbin-Watson: 3.000\n",
"Prob(Omnibus): nan Jarque-Bera (JB): 0.531\n",
"Skew: 0.707 Prob(JB): 0.767\n",
"Kurtosis: 1.500 Cond. No. 2.92\n",
"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import statsmodels.api as sm\n",
"\n",
"model = sm.OLS(for_fcst_df['Sales'], sm.add_constant(for_fcst_df['t']), hasconst=True)\n",
"results = model.fit()\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now get the intercept and $\\beta$ from the regression results."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"intercept = results.params['const']\n",
"beta = results.params['t']"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1050.0"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"intercept"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"49.99999999999982"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"beta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Now Predict from Regression Results"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1199.9999999999995"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst_sales = intercept + beta * 3\n",
"fcst_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's wrap up the regression approach into a function to use it with COGS as well."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1199.9999999999995"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def intercept_and_t_beta_from_for_forecast_df(for_fcst_df, series_name):\n",
" \"\"\"\n",
" Calculates intercept and beta of time periods from DataFrame set up for forecasting\n",
" \"\"\"\n",
" model = sm.OLS(for_fcst_df[series_name], sm.add_constant(for_fcst_df['t']), hasconst=True)\n",
" results = model.fit()\n",
" intercept = results.params['const']\n",
" beta = results.params['t']\n",
" return intercept, beta\n",
"\n",
"def predict_from_intercept_beta_and_t(intercept, beta, t):\n",
" \"\"\"\n",
" Predicts value in period t based off regression intercept and beta\n",
" \"\"\"\n",
" return intercept + beta * t\n",
"\n",
"intercept, beta = intercept_and_t_beta_from_for_forecast_df(for_fcst_df, 'Sales')\n",
"fcst_sales = predict_from_intercept_beta_and_t(intercept, beta, 3)\n",
"fcst_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Entire Approach for COGS Using Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's write one more function to do the entire forecast, putting everything together."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $1,200 and for COGS is $687\n"
]
}
],
"source": [
"def forecast_trend_reg(df, series_name, t):\n",
" \"\"\"\n",
" Full workflow of forecasting trend via regression\n",
" \"\"\"\n",
" for_fcst_df = for_forecast_df_from_orig_df(df, series_name)\n",
" intercept, beta = intercept_and_t_beta_from_for_forecast_df(for_fcst_df, series_name)\n",
" fcst = predict_from_intercept_beta_and_t(intercept, beta, t)\n",
" return fcst\n",
"\n",
"fcst_sales = forecast_trend_reg(df, 'Sales', 3)\n",
"fcst_cogs = forecast_trend_reg(df, 'Cost of Goods Sold', 3)\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Trend Method 2: By CAGR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's work off the `DataFrame` set up for forecasting from the regression approach."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
t
\n",
"
Sales
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
0
\n",
"
1000
\n",
"
\n",
"
\n",
"
1
\n",
"
1
\n",
"
1200
\n",
"
\n",
"
\n",
"
2
\n",
"
2
\n",
"
1100
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" t Sales\n",
"0 0 1000\n",
"1 1 1200\n",
"2 2 1100"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for_fcst_df = for_forecast_df_from_orig_df(df, 'Sales')\n",
"for_fcst_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to calculate $$\\frac{y_T}{y_0}^{\\frac{1}{n}} - 1$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use `.iloc` (integer location) to get the first and last values of sales."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1000"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_0 = for_fcst_df['Sales'].iloc[0]\n",
"y_0"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1100"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_T = for_fcst_df['Sales'].iloc[-1]\n",
"y_T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can get the number of time periods elapsed in a similar way."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n = for_fcst_df['t'].iloc[-1] - for_fcst_df['t'].iloc[0]\n",
"n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now just calculate"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.04880884817015163"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cagr = (y_T / y_0)**(1 / n) - 1\n",
"cagr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now to get the predicted value for period 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's wrap this up into functions."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1153.6897329871667"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def cagr_from_for_forecast_df(for_fcst_df, series_name):\n",
" \"\"\"\n",
" Calculates CAGR from DataFrame set up for forecasting\n",
" \"\"\"\n",
" y_0 = for_fcst_df[series_name].iloc[0]\n",
" y_T = for_fcst_df[series_name].iloc[-1]\n",
" n = for_fcst_df['t'].iloc[-1] - for_fcst_df['t'].iloc[0]\n",
" cagr = (y_T / y_0)**(1 / n) - 1\n",
" return cagr\n",
"\n",
"\n",
"def predict_from_for_forecast_df_and_cagr(for_fcst_df, series_name, cagr, t):\n",
" \"\"\"\n",
" Forecast value from DataFrame set up for forecasting and calculated CAGR\n",
" \"\"\"\n",
" y_T = for_fcst_df[series_name].iloc[-1]\n",
" n = for_fcst_df['t'].iloc[-1] - for_fcst_df['t'].iloc[0]\n",
" \n",
" future_nper = t - n\n",
" fcst = y_T * (1 + cagr)**future_nper\n",
" return fcst\n",
"\n",
"\n",
"cagr = cagr_from_for_forecast_df(for_fcst_df, 'Sales')\n",
"fcst_sales = predict_from_for_forecast_df_and_cagr(for_fcst_df, 'Sales', cagr, 3)\n",
"fcst_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's put the entire approach in one function."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $1,154 and for COGS is $666\n"
]
}
],
"source": [
"def forecast_trend_cagr(df, series_name, t):\n",
" \"\"\"\n",
" Full workflow of forecasting trend via CAGR\n",
" \"\"\"\n",
" for_fcst_df = for_forecast_df_from_orig_df(df, series_name)\n",
" cagr = cagr_from_for_forecast_df(for_fcst_df, series_name)\n",
" fcst = predict_from_for_forecast_df_and_cagr(for_fcst_df, series_name, cagr, t)\n",
" return fcst\n",
"\n",
"fcst_sales = forecast_trend_cagr(df, 'Sales', 3)\n",
"fcst_cogs = forecast_trend_cagr(df, 'Cost of Goods Sold', 3)\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecasting as a %"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can estimate COGS as a percentage of sales. To do this, we must first forecast sales, then forecast the percentage of sales, then combine the two. We already have a sales forecast from the last section, so let's keep that. Next is forecasting the percentage of sales. To do this we must first calculate the historical percentage of sales."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
2017-12-31
\n",
"
2018-12-31
\n",
"
2019-12-31
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Sales
\n",
"
1000.00
\n",
"
1200.000000
\n",
"
1100.000000
\n",
"
\n",
"
\n",
"
Cost of Goods Sold
\n",
"
620.00
\n",
"
700.000000
\n",
"
650.000000
\n",
"
\n",
"
\n",
"
COGS % Sales
\n",
"
0.62
\n",
"
0.583333
\n",
"
0.590909
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 2017-12-31 2018-12-31 2019-12-31\n",
"Sales 1000.00 1200.000000 1100.000000\n",
"Cost of Goods Sold 620.00 700.000000 650.000000\n",
"COGS % Sales 0.62 0.583333 0.590909"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc['COGS % Sales'] = df.loc['Cost of Goods Sold'] / df.loc['Sales']\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can forecast this by any of the available methods."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\users\\admin\\.virtualenvs\\fin-model-course-gvj3lsuv\\lib\\site-packages\\numpy\\core\\fromnumeric.py:2495: FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.\n",
" return ptp(axis=axis, out=out, **kwargs)\n"
]
},
{
"data": {
"text/plain": [
"0.5689898989898988"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fcst_cogs_pct_sales = forecast_trend_reg(df, 'COGS % Sales', 3)\n",
"fcst_cogs_pct_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now combine with the existing sales forecast."
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales is $1,154 and for COGS is $656\n"
]
}
],
"source": [
"fcst_cogs = fcst_sales * fcst_cogs_pct_sales\n",
"print(f'The forecasted value for sales is ${fcst_sales:,.0f} and for COGS is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## All the Approaches, Together\n",
"\n",
"There is a dizzying array of forecast options, even only considering simple forecast methods. Here is a quick overview of the approaches."
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The forecasted value for sales (average) is $1,100 and for COGS (average) is $657\n",
"The forecasted value for sales (average) is $1,100 and for COGS (% of Sales, average) is $658\n",
"The forecasted value for sales (average) is $1,100 and for COGS (recent) is $650\n",
"The forecasted value for sales (average) is $1,100 and for COGS (% of Sales, recent) is $650\n",
"The forecasted value for sales (average) is $1,100 and for COGS (trend reg) is $687\n",
"The forecasted value for sales (average) is $1,100 and for COGS (% of Sales, trend reg) is $626\n",
"The forecasted value for sales (average) is $1,100 and for COGS (trend cagr) is $666\n",
"The forecasted value for sales (average) is $1,100 and for COGS (% of Sales, trend cagr) is $635\n",
"The forecasted value for sales (recent) is $1,100 and for COGS (average) is $657\n",
"The forecasted value for sales (recent) is $1,100 and for COGS (% of Sales, average) is $658\n",
"The forecasted value for sales (recent) is $1,100 and for COGS (recent) is $650\n",
"The forecasted value for sales (recent) is $1,100 and for COGS (% of Sales, recent) is $650\n",
"The forecasted value for sales (recent) is $1,100 and for COGS (trend reg) is $687\n",
"The forecasted value for sales (recent) is $1,100 and for COGS (% of Sales, trend reg) is $626\n",
"The forecasted value for sales (recent) is $1,100 and for COGS (trend cagr) is $666\n",
"The forecasted value for sales (recent) is $1,100 and for COGS (% of Sales, trend cagr) is $635\n",
"The forecasted value for sales (trend reg) is $1,200 and for COGS (average) is $657\n",
"The forecasted value for sales (trend reg) is $1,200 and for COGS (% of Sales, average) is $718\n",
"The forecasted value for sales (trend reg) is $1,200 and for COGS (recent) is $650\n",
"The forecasted value for sales (trend reg) is $1,200 and for COGS (% of Sales, recent) is $709\n",
"The forecasted value for sales (trend reg) is $1,200 and for COGS (trend reg) is $687\n",
"The forecasted value for sales (trend reg) is $1,200 and for COGS (% of Sales, trend reg) is $683\n",
"The forecasted value for sales (trend reg) is $1,200 and for COGS (trend cagr) is $666\n",
"The forecasted value for sales (trend reg) is $1,200 and for COGS (% of Sales, trend cagr) is $692\n",
"The forecasted value for sales (trend cagr) is $1,154 and for COGS (average) is $657\n",
"The forecasted value for sales (trend cagr) is $1,154 and for COGS (% of Sales, average) is $690\n",
"The forecasted value for sales (trend cagr) is $1,154 and for COGS (recent) is $650\n",
"The forecasted value for sales (trend cagr) is $1,154 and for COGS (% of Sales, recent) is $682\n",
"The forecasted value for sales (trend cagr) is $1,154 and for COGS (trend reg) is $687\n",
"The forecasted value for sales (trend cagr) is $1,154 and for COGS (% of Sales, trend reg) is $656\n",
"The forecasted value for sales (trend cagr) is $1,154 and for COGS (trend cagr) is $666\n",
"The forecasted value for sales (trend cagr) is $1,154 and for COGS (% of Sales, trend cagr) is $666\n"
]
}
],
"source": [
"def forecast_by_method(df, series_name, method, t):\n",
" if method == 'average':\n",
" return df.loc[series_name].mean()\n",
" elif method == 'recent':\n",
" last_date = df.columns.max()\n",
" return df.loc[series_name][last_date]\n",
" elif method == 'trend reg':\n",
" return forecast_trend_reg(df, series_name, t)\n",
" elif method == 'trend cagr':\n",
" return forecast_trend_cagr(df, series_name, t)\n",
"\n",
"methods = [\n",
" 'average',\n",
" 'recent',\n",
" 'trend reg',\n",
" 'trend cagr'\n",
"]\n",
"\n",
"t = 3\n",
"\n",
"cogs_forecasts = []\n",
"for sales_method in methods:\n",
" fcst_sales = forecast_by_method(df, 'Sales', sales_method, t)\n",
" for cogs_method in methods:\n",
" # Handle levels for COGS\n",
" fcst_cogs = forecast_by_method(df, 'Cost of Goods Sold', cogs_method, t)\n",
" cogs_forecasts.append(fcst_cogs)\n",
" print(f'The forecasted value for sales ({sales_method}) is ${fcst_sales:,.0f} and for COGS ({cogs_method}) is ${fcst_cogs:,.0f}')\n",
" # Handle % of sales for COGS\n",
" fcst_cogs_pct = forecast_by_method(df, 'COGS % Sales', cogs_method, t)\n",
" fcst_cogs = fcst_cogs_pct * fcst_sales\n",
" cogs_forecasts.append(fcst_cogs)\n",
" print(f'The forecasted value for sales ({sales_method}) is ${fcst_sales:,.0f} and for COGS (% of Sales, {cogs_method}) is ${fcst_cogs:,.0f}')"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAMcUlEQVR4nO3dX4il9X3H8fenblZiQDfNjq1116zSVUiaRNep0UJtG7FRU9xeWFhvFFvYNFipgbZZCbTNRcCmhaKUChIjChJRiY1UY2JzUejFakajpv5LNkaz01UzFjRFU43ptxfziMf1uHNmd2aP+/X9gsOc8zu/Z+Y7N+85PPPMmVQVkqRefmnaA0iSVp5xl6SGjLskNWTcJakh4y5JDa2Z9gAA69evr02bNk17DEk6pNx///3PV9XMuOfeEXHftGkTc3Nz0x5Dkg4pSZ5+u+eWPC2T5KQkD47cfprk8iR/n+TxJA8nuT3JupFjrkiyK8kTST65Ut+IJGkyS8a9qp6oqpOr6mTgVOBl4HbgHuA3quqjwPeBKwCSfAjYBnwYOAf45ySHrdL8kqQxlvsL1bOAH1bV01X1rap6bVjfCWwY7m8Fbq6qV6rqR8Au4LSVGVeSNInlxn0b8NUx638MfGO4fyywe+S5+WHtTZJsTzKXZG5hYWGZY0iS9mXiuCdZC5wP3LrX+ueB14CbXl8ac/hb3sCmqq6tqtmqmp2ZGfvLXknSflrO1TLnAg9U1XOvLyS5GPgD4Kx64x3I5oGNI8dtAPYc6KCSpMkt57TMhYyckklyDvA54Pyqenlk3x3AtiSHJzke2AzctxLDSpImM9Er9yRHAGcDnx5Z/ifgcOCeJAA7q+pPq+qRJLcAj7J4uubSqvrFyo4tSdqXieI+vDL/wF5rv76P/V8Evnhgo0krb3ghsur8PwmaNt9bRu8qVbXs2wc/96/LPkaaNuMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDS0Z9yQnJXlw5PbTJJcn+eUk9yT5wfDx/cP+JLk6ya4kDyfZsvrfhiRp1JJxr6onqurkqjoZOBV4Gbgd2AF8u6o2A98eHgOcC2webtuBa1ZjcEnS21vuaZmzgB9W1dPAVuCGYf0G4A+H+1uBG2vRTmBdkmNWZFpJ0kSWG/dtwFeH+79SVc8ADB+PHtaPBXaPHDM/rL1Jku1J5pLMLSwsLHMMSdK+TBz3JGuB84Fbl9o6Zq3eslB1bVXNVtXszMzMpGNIkiawnFfu5wIPVNVzw+PnXj/dMnz8ybA+D2wcOW4DsOdAB5UkTW45cb+QN07JANwBXDzcvxj4+sj6RcNVM6cDL75++kaSdHCsmWRTkiOAs4FPjyxfCdyS5E+AHwN/NKzfBZwH7GLxyppLVmxaSdJEJop7Vb0MfGCvtf9m8eqZvfcWcOmKTCdJ2i/+haokNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ2umPYC0vz72hW/x4s9+flC+1qYdd67q5z/qve/hob/5/VX9Gnp3Me46ZL34s5/z1JWfmvYYK2K1f3jo3cfTMpLUkHGXpIaMuyQ1NFHck6xLcluSx5M8luSMJCcn2ZnkwSRzSU4b9ibJ1Ul2JXk4yZbV/RYkSXub9BeqVwF3V9UFSdYCRwC3AF+oqm8kOQ/4EvC7wLnA5uH2ceCa4aMk6SBZ8pV7kiOBM4HrAKrq1ap6ASjgyGHbUcCe4f5W4MZatBNYl+SYFZ9ckvS2JnnlfgKwAFyf5GPA/cCfA5cD30zyDyz+kPitYf+xwO6R4+eHtWdGP2mS7cB2gOOOO+4AvgVJ0t4mOee+BtgCXFNVpwAvATuAzwCfraqNwGcZXtkDGfM56i0LVddW1WxVzc7MzOzX8JKk8SaJ+zwwX1X3Do9vYzH2FwNfG9ZuBU4b2b9x5PgNvHHKRpJ0ECwZ96p6Ftid5KRh6SzgURaD/TvD2ieAHwz37wAuGq6aOR14saredEpGkrS6Jr1a5jLgpuFKmSeBS4CvA1clWQP8L8P5c+Au4DxgF/DysFeSdBBNFPeqehCY3Wv5P4BTx+wt4NIDH02StL/8C1VJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkMTxT3JuiS3JXk8yWNJzhjWL0vyRJJHknxpZP8VSXYNz31ytYaXJI23ZsJ9VwF3V9UFSdYCRyT5PWAr8NGqeiXJ0QBJPgRsAz4M/Brwb0lOrKpfrML8kqQxlnzlnuRI4EzgOoCqerWqXgA+A1xZVa8M6z8ZDtkK3FxVr1TVj4BdwGmrMbwkabxJTsucACwA1yf5bpIvJ3kfcCLw20nuTfLvSX5z2H8ssHvk+Plh7U2SbE8yl2RuYWHhAL8NSdKoSeK+BtgCXFNVpwAvATuG9fcDpwN/CdySJEDGfI56y0LVtVU1W1WzMzMz+zu/JGmMSeI+D8xX1b3D49tYjP088LVadB/wf8D6YX3jyPEbgD0rN7IkaSlLxr2qngV2JzlpWDoLeBT4F+ATAElOBNYCzwN3ANuSHJ7keGAzcN8qzC5JehuTXi1zGXDTcKXMk8AlLJ6e+UqS/wReBS6uqgIeSXILiz8AXgMu9UoZSTq4stjj6Zqdna25ublpj6FDzEdu+Mi0R1hR37v4e9MeQYeYJPdX1ey45yZ95S694/zPY1fy1JWfmvYYK2LTjjunPYKa8e0HJKkh4y5JDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhnw/dx3SurwP+lHvfc+0R1Azxl2HrIP1jzo27bizzT8F0buHp2UkqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNTRR3JOsS3JbkseTPJbkjJHn/iJJJVk/PE6Sq5PsSvJwki2rNbwkabxJ/xPTVcDdVXVBkrXAEQBJNgJnAz8e2XsusHm4fRy4ZvgoSTpIlnzlnuRI4EzgOoCqerWqXhie/kfgr4AaOWQrcGMt2gmsS3LMyo4tSdqXSU7LnAAsANcn+W6SLyd5X5Lzgf+qqof22n8ssHvk8fywJkk6SCaJ+xpgC3BNVZ0CvAT8LfB54K/H7M+YtXrLpmR7krkkcwsLC5NPLEla0iRxnwfmq+re4fFtLMb+eOChJE8BG4AHkvzqsH/jyPEbgD17f9KquraqZqtqdmZm5gC+BUnS3paMe1U9C+xOctKwdBbwQFUdXVWbqmoTi0HfMuy9A7houGrmdODFqnpmleaXJI0x6dUylwE3DVfKPAlcso+9dwHnAbuAl5fYK0laBRPFvaoeBGb38fymkfsFXHrAk0mS9pt/oSpJDRl3SWrIuEtSQ8Zdkhoy7pLUkHGXpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGjLuktSQcZekhoy7JDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SGjLskNWTcJakh4y5JDa2Z9gDSwZRk/477u+Xtr6r9+jrSSjHuelcxunq38LSMJDVk3CWpIeMuSQ0Zd0lqyLhLUkPGXZIaMu6S1JBxl6SG8k74o44kC8DT055DehvrgeenPYQ0xgerambcE++IuEvvZEnmqmp22nNIy+FpGUlqyLhLUkPGXVratdMeQFouz7lLUkO+cpekhoy7JDVk3KW3keScJE8k2ZVkx7TnkZbDc+7SGEkOA74PnA3MA98BLqyqR6c6mDQhX7lL450G7KqqJ6vqVeBmYOuUZ5ImZtyl8Y4Fdo88nh/WpEOCcZfGy5g1z2HqkGHcpfHmgY0jjzcAe6Y0i7Rsxl0a7zvA5iTHJ1kLbAPumPJM0sTWTHsA6Z2oql5L8mfAN4HDgK9U1SNTHkuamJdCSlJDnpaRpIaMuyQ1ZNwlqSHjLkkNGXdJasi4S1JDxl2SGvp/v/AdtSYzyHMAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"pd.DataFrame(cogs_forecasts).plot.box()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}